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An Implementation of FFT, DCT, and 
Other Transforms on the TMS320C30 


Abstract 


This book describes the several types of transforms and related 
algorithms used on the TMS320C30 family of digital signal 
processors. These include: 


LY The Fast Fourier Transforms (FFTs) 
m the complex radix-2 FFT 
m the complex radix-4 FFT 
m the real valued radix-2 
U) The Discrete Hartley Transtorm (DHT) 
O) The Discrete Cosine Transform (DCT) 


The book contains: 


O) A description of transforms and their implementation on the 
TMS32030 family of digital signal processors. 


OY Adescription and comparison of the different kinds of 
transforms: the FFTs, the Hartley transform and the Cosine 
transform 


O) A description of the features of the TMS320C30 that allow the 
efficient implementation of these algorithms 


OY Outlines of specific descriptions of implementations, 
transforms and TMS320C30 C Compiler facts 


UO) Implementation issues 


O) Several graphics and tables detailing 
m Forms and flowgraphs of FFTs 


vp 
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m@ Memory requirements for FFT and Hartley transforms 
m Differences in FFT and DCT timing 


The end of the book contains 17 appendices with actual 
TMS320C30 source code for performing transforms. 
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An Implementation of FFT, DCT, and Other Transforms on the TMS320C30 


This report describes the implementation of several Fast Fourier Transforms (FFTs) 
and related algorithms on the TMS320C30. The TMS320C30 is the first device in the 
third generation of 32-bit floating-point Digital Signal Processors (DSPs) in the Texas 
Instruments TMS320 family. The algorithms considered here are the complex radix-2 FFT, 
the complex radix-4 FFT, the real-valued radix-2 FFT (both forward and inverse 
transforms), the Discrete Hartley Transform (DHT), and the Discrete Cosine Transform 
(DCT). These transforms have many applications, such as in image processing, sonar, 
and radar. 


The introduction briefly describes transforms and their implementation on the 
TMS320 family of processors. Next, the different kinds of FFTs (including the real FFT), 
the closely-related Hartley transform, and the Cosine transform are described and com- 
pared. This is followed by a description of the TMS320C30 features that permit efficient 
implementations of these algorithms. Then, specific implementations, transforms, and 
TMS320C30 C Compiler facts are outlined. Finally, the report discusses some implemen- 
tation issues, and the appendices list actual TMS320C30 code for performing transforms. 


The powerful architecture and instruction set of the TMS320C30 permit flexible 
and compact coding of the algorithms in assembly language while preserving close cor- 
respondence to a high-level language implementation. The efficiency of the architecture 
and the speed of the device make faster realization of real and complex transforms possi- 
ble. With the availability of a C compiler, these routines can be put in C-callable form 
and used as faster versions of FFT C functions. 


Introduction 


The Fast Fourier Transform (FFT) is an important tool used in Digital Signal Pro- 
cessing (DSP) applications. Its development by Cooley and Tuckey gave impetus to the 
establishment of DSP as an independent discipline. The well-structured form of the FFT 
has also made it one of the benchmarks in assessing the performance of number-crunching 
devices and systems. 


In recent years, because of the popularity of this signal-processing tool, there have 
been efforts to improve its performance by advances both at the algorithmic level and 
in hardware implementation. Researchers have been developing efficient algorithms to 
increase the execution speed of FFTs while keeping requirements for memory size low. 
On the other hand, developers of VLSI systems are including features in their designs 
that improve system performance for applications requiring FFTs. In particular, single- 
chip programmable DSP devices, currently available or under development, can realize 
FFTs with speeds that allow the implementation of very complex systems in realtime. 


The Texas Instruments TMS320 family consists of five generations of programmable 
digital signal processors. The TMS32010 introduced the first generation, which today en- 
compasses more than twelve devices with various speeds, interfacing capabilities, and 
price/performance combinations. FFT implementations on the TMS32010 can be found 
in the appendix of the book by Burrus and Parks [1]. 


The second-generation TMS320 devices (the TMS32020, the TMS320C25, and their 
spinoffs) enhanced the architecture and speed capabilities of the first generation. Examples 
of FFT programs implemented on the TMS32020 can be found in an application report 
in the book Digital Signal Processing Applications with the TMS320 Family [2]. Such pro- 
grams are easily extended to the TMS320C25 because of the code compatibility between 
devices. 


The architectural and speed improvements on the processors from one generation 
to the next have made the FFT computation faster and the programming easier. These 
advantages have reached a new high level in the third generation. The TMS320C30 is 
the first device in the third generation, and this report examines implementation of the 
FFT algorithms on it. The fourth generation (TMS320C4x) is a new set of floating-point 
devices, while the fifth generation (TMS320C5x) is a continuation of the fixed-point devices. 
Since software compatibility is maintained within the fixed-point and the floating-point 
devices, the existing FFT implementations will also be applicable to these new generations. 


The Fourier Transform of an analog signal x(t), given as 


oo 
X(w) = x(t) e—Jetdt (1) 
determines the frequency content of the signal x(t). In other words, for every frequency, 
the Fourier transform X(w) determines the contribution of a sinusoid of that frequency 
in the composition of the signal x(t). For computations on a digital computer, the signal 
x(t) is sampled at discrete-time instants. If the input signal is digitized, a sequence of numbers 
x(n) is available instead of the continuous-time signal x(t). Then, the Fourier transform 
takes the form 
co 
X(eiv) = x(n) em jun (2) 


n=-— oo 


The resulting transform X(e/“) is a periodic function of w, and it needs to 
be computed for only one period. The actual computation of the Fourier transform of a 
stream of data presents difficulties because X(e/“) is a continuous function in w. Since 
the transform must be computed at discrete points, the properties of the Fourier transform 
led to the definition of the Discrete Fourier Transform (DFT), given by 


N-1 


_ j2n 
Xk)= Lb xn)e N (3) 
n=0 
When x(n) consists of N points x(0), x(1), . . ., x(N-1), the frequency-domain 


representation is given by the set of N points X(k), k=0,1, . . .,N-1. Equation (3) is often 
written in the form 


N-1 F 
Xk) = Le x(n) Wy (4) 
n=0 


nk : ; ; ; 
where W y, =e—j 27k /N. The factor Wy is sometimes referred to as the twiddle factor. 


A detailed description of the DFT can be found in references [1,3,4]. The computational 
requirements of the DFT increase rapidly with increasing block size N, having an impact 
on the real-time system performance. This problem was alleviated with the development 
of special fast algorithms, collectively known as Fast Fourier Transform (FFT). With an 
FFT, the computational burden increases much less rapidly with N, and for any given 
N, the FFT computational load, measured in terms of required multiplications and addi- 
tions, is smaller than a brute-force computation of the DFT. 


The definition of the FFT is identical to the DFT: only the method of computation 
differs. To achieve the efficiency of an FFT, it is important that N be a highly composite 
number. Typically, the length N of the FFT is a power of 2: N = 2M, and the whole 
algorithm breaks down into a repeated application of an elementary transform known as 
a butterfly. If N is not a power of 2, the sequence x(n) is appended with enough zeroes 
to make the total length a power of 2. Again, references [1,3,4] contain a detailed develop- 
ment of the FFT. Reference [2] also discusses the same topic. 


Different Forms of the FFT 


Over the years, researchers have developed different forms of FFT for more effi- 
cient computation. Special cases, such as those in which the input is a sequence of real 
numbers, have been investigated, and even more sophisticated algorithms have been 
developed. The general form of the FFT butterfly is given in Figure 1. 
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Figure 1. Radix-2 Butterfly for Decimation in Time 


If the inputs to the butterfly are the two complex numbers P and Q, the outputs will 
be the complex numbers P’ and Q’, such that 


P’=P+QW, (5) 
and 


Q’=P- QW, (6) 


The quantities P, Q, and P’, Q’ represent different points in the array being trans- 
formed, and they may or may not occupy adjacent locations in that array. In an in-place 
computation, the result P’ will overwrite P, and Q’ will overwrite Q. W : represents again 


the twiddle factor, and its exponent is determined by the location of the corresponding 
butterfly in the FFT algorithm. 


Figure 2 shows an alternate form of the same FFT butterfly. 
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Figure 2. Alternate Form of Radix-2 Butterfly for Decimation in Time. 


Although the notation is now less descriptive, it creates a clearer picture when several 
butterflies are put together to form an FFT. Using the first notation, Figure 3 is the 
flowgraph of an 8-point FFT example. 
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Figure 3. Example of 8-Point FFT with Decimation in Time. 


Note that the input sequence x(n) is in the correct order, while the output X(k) is _ 
scrambled. Actually, this scrambling occurs in a very systematic way, called bit-reversed 
order: If you express the indices of a scrambled sequence in binary and you reverse this 
number, the result is the order that this particular point occupies. For instance, X(3) oc- 
cupies the sixth position in the output (when counting from the zero position). In binary 
form, 3139 = 011,, and if bit-reversed, you get 110. = 619, which is the position that 
X(3) occupies. It turns out that the third position is occupied by X(6), and to restore the 
correct order at the output, you need only to swap these two numbers. 


The same procedure can be repeated with all the scrambled numbers not occupying 
the position that their index suggests. If the input sequence x(n) is rearranged to appear 
in bit-reversed form, the output X(k) appears in the correct order, as shown in Figure 4. 
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Figure 4. Alternate Form of 8-Point FFT with Decimation in Time. The Input Is in 
Bit-Reversed Order and the Output Is in the Correct Order. 


Since the only difference between Figures 3 and 4 is a rearrangement of the but- 
terflies, the computational load and the final results are identical. In terms of implementa- 
tion, this rearrangement means that the nesting of the two innermost loops in the FFT 
routine is interchanged. | 


The butterflies and the FFT configurations presented thus far implement the FFT 
with a decimation in time. This terminology essentially describes a way of grouping the 
terms of the DFT definition; see Equation (3). An alternative way of grouping the DFT 
terms together is called decimation in frequency. Figures 5 and 6 show the same example 
of an 8-point FFT: Figure 5 with the input in correct order and the output in bit-reversed 
order, and Figure 6 vice-versa, and using the decimation in frequency (DIF). 
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Figure 5. Example of an 8-Point FFT with Decimation in Frequency. 
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| Figure 6. Alternate Form of 8-Point FFT with Decimation in Frequency. The Input 
Is in Bit-Reversed Order and the Output Is in the Correct Order 


Pictorially, the difference between decimation in time and decimation in frequency 
is that the twiddle factor appears at the input of the butterfly in the first, and at the output 
in the second. Otherwise, the two methods are identical in terms of results. However, 
depending on what is the most convenient order of getting the twiddle factors and where 
the longest-span butterfly appears, you may prefer one method over the other. 


The butterfly shown in Figure 1 (or Figure 2) is the smallest element in a radix-2 
FFT. The radix of the FFT represents the number of inputs that are combined in a butter- 
fly. The Fast Fourier Transform is usually explained around the radix-2 algorithm for 
conceptual simplicity. If, however, higher-order radices are used, more computational 
savings can be achieved. These savings increase with the radix, but there is very little 
improvement above radix 4. That’s why the radix-2 and radix-4 FFTs are the most com- 
monly used algorithms. 


In radix-4 FFT, each butterfly has 4 inputs and 4 outputs, essentially combining 
two stages of a radix-2 algorithm in one. Figure 7 shows this combination graphically. 
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Figure 7. Butterfly for Radix-4, Decimation-in-Time FFT. 


Although four radix-2 butterflies are combined into one radix-4 butterfly, the com- 
putational load of the latter is less than four times the load of a radix-2 butterfly. Ex- 
amples of radix-4, 16-point FFTs are shown in Figures 8 and 9 for decimation in time 
and decimation in frequency, respectively. 


} 
i 


Figure 8. Example of a 16-Point, Radix-4, Decimation-in-Time FFT. 
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Figure 9. Example of a 16-Point, Radix-4, Decimation-in-Frequency FFT. 


These configurations take the incoming sequence in order and produce the frequency- 
domain result in digit-reversed form. It is a simple matter to rearrange the FFT and have 
the input in digit-reversed form and the output in order. 


Digit reversal is similar to bit reversal, except that the number whose digits are re- 
versed is written in base 4 (equal to the radix) rather than base 2. For example, the output 
value X(14) in a 16-point, radix-4 FFT occupies position eleven (again starting from zero) 
because 1419 = 32, and, reversing the digits of the number, 234 = 11,9. To restore the 
output to the correct order, the contents of locations with digit-reversed indices should 
be swapped. However, since the TMS320C30 has a special bit-reversed addressing mode, 
it is desirable to have the output of the radix-4 computation in bit-reversed rather than 
digit-reversed form. This is accomplished quite simply if, in each radix-4 butterfly, the 
two middle output legs are interchanged. That is, whenever the output of the butterfly 
is the four numbers A’, B’, C’, and D’, instead of storing them in that order, store them 
in the order A’, C’, B’, and D’, as shown in Figure 10. 
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Figure 10. Radix-4 Butterflies. (a) Regularly-Ordered Output, (b) Bit-Reversed 
Output. 


References [5, 6] explain why this simple rearrangement puts the result in bit-reversed 
order. 


Features of the TMS320C30 


The TMS320C30 is the first device introduced in the third generation of the TMS320 
Digital Signal Processors [7,8]. It has many architectural features that permit very effi- 
cient implementation of algorithms. Some of those features pertinent to the FFT implemen- 
tation are discussed in this section. 


The two most salient characteristics of the TMS320C30 device are its high speed 
(60-ns cycle time) and floating-point arithmetic. The higher speed makes the implementa- 
tion of real-time application easier than in earlier processors, even when the other architec- 
tural advantages are not considered. Each instruction executes in a single cycle under mild 
pipeline restrictions. The device automatically takes care of any potential conflicts. The 
pipeline should be observed closely (e.g., using the trace capability of the simulator) only 
if code optimization for speed is required. 


The floating-point capability permits the handling of numbers of high dynamic range 
without concern for overflows. In FFT programs, in particular, the computed values tend 
to increase from one stage to the next, as discussed in reference [2]. Then, the fixed-point 
arithmetic will cause overflows if the incoming numbers are large enough and no provi- 
sions are made for scaling. All these considerations are eliminated with the floating-point 
capability of the TMS320C30. The TMS320C30 performs floating-point arithmetic with 
the same speed as any fixed point operation; no performance is sacrificed for this feature. 


There are eight extended-precision registers, RO—R7, that can be used as ac- 
cumulators or general-purpose registers, and eight auxiliary registers, ARO—AR7, for 
addressing and integer arithmetic. For many applications, these registers are sufficient 
for temporary storage of values, and there is no need to use memory locations. This is 
the case with the radix-2 FFT algorithm, where no locations are required other than those 
for the transformation of incoming data to be transformed. Also, arithmetic using these 
registers greatly increases the programming efficiency. The two index registers, [RO and 
IR1, are used for indexing the contents of the auxiliary registers ARO—AR7, thus making 
the access of the butterfly legs and the twiddle factors easy. 


A powerful structure in the TMS320C30 is the block-repeat capability that has the 
form 


RPTB LABEL 
put instructions here 
LABEL last instruction 


Whatever occurs after the RPTB instruction and up to the LABEL is repeated one 
time more than the number included in the repeat counter register, RC. The RC register 
must be initialized before entering the block-repeat construct. The net effect is that the 
repeated code behaves as if it were straight-line coded (no penalty for looping), with pro- 
gram size equal to the one in looped code. In this way, the FFT butterfly, being the core 
of the program, can be implemented in a block-repeat form, thereby saving execution time 
while preserving the clarity of the program and conserving program space. 


A bit-reversed addressing mode is available to eliminate the need for swapping 
memory locations at the beginning or the end of the FFT (depending on the FFT type). 
When you use this addressing mode, you access a sequence of data points in bit-reversed 
order rather than sequentially, and you can recover the points in the correct order during 
retrieval of the data instead of spending extra cycles to accomplish it in software. 


Implementation of Radix-2 and Radix-4 Complex FFTs 


Because of the powerful architecture and the instruction set of the TMS320C30, 
the assembly language program follows closely the flow of a high-level language pro- 
gram; this makes it easy to read and debug. It also keeps the size of the program small 
and reduces the requirements for program memory. Appendix A presents an example of 
code for a Radix-2 complex FFT, while Appendix B is a radix-4 complex FFT. The pro- 
gram memory requirements for these programs (as well as others to be discussed later) 
are given in Table 1. 


Table 1. Program Memory Requirements for the Core of the FFT and Hartley 
Transforms 


Routine Type Program Size 


Radix-2, complex FFT 50 words 
Radix-4, complex FFT 170 words 
Radix-2, real FFT 68 words 


76 words 
71 words 


Radix-2, real inverse FFT 


Hartley transform 


The numbers in the table correspond only to the core program and do not include 
the sine/cosine tables for the twiddle factors, any input/output, or any bit-reversing opera- 
tions. Note also that they are independent of the FFT data size. 


The data memory requirements are, of course, dependent on the FFT size. The max- 
imum length of a complex, radix-2 FFT that can be implemented entirely on the internal 
memory of the TMS320C30 is 1024 points. In the present implementation, the 1024-point 
radix-4 FFT requires a few more locations (about 7) than are available on-chip. 


The code (provided in the appendices) has been written to be independent of the 
FFT length. The length N, together with the sine/cosine tables for the twiddle factors, 
should be provided separately to maintain the generic nature of the core FFT program. 
An example of a file with the sine/cosine tables for a 64-point FFT is given in the Appen- 
dix F. Note that the FFT size and the number of stages are declared .global in both files 
(i.e., the main routine and the file with the table) so that the core program gets the actual 
values during linking. 


To reduce the storage requirements of a sine/cosine table, a full sine and a cosine 
cycle are overlapped. The table stores 5/4 of a full sine wave, with the cosine table start- 
ing with a phase delay of 1/4 cycle from the sine table. This table size is larger than ac- 
tually needed, and it is selected merely for testing convenience of the algorithms. The 
minimum table size for a radix-2 complex FFT includes 1/2 of a full sine wave, and 1/2 
of a full cosine wave. If these two half waves are combined using the above quarter-cycle 
phase delay, the minimum table size for this kind of FFT is 3/4 of a full sine wave. For 
instance, for a 1024-point FFT, the table can be the first 768 points of a sine wave, where 
a full cycle would be 1024 points. In the case of a radix-4 complex FFT, the minimum 
table size should include 3/4 of a sine and 3/4 of a cosine wave. Overlapping these re- 
‘quirements, we get the minimum table size of a radix-4 algorithm to be one full sine wave. 


An example of a linking file is also included in Appendix F to show how the dif- 
ferent segments are assigned. For a complete description of the assembler and linker, consult 
the corresponding manual [6]. 
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The timing of the FFT routines was done using the cycle-counting capability of 
the TMS320C30 simulator. For the conversion of the number of cycles into seconds, a 
cycle time of 50 ns was used. The timing refers only to the core FFT computation, ignoring 
read-in and write-out requirements, since such requirements are application-dependent. 
Also, no bit reversal is counted (although it may be included in the program), since it 
is performed as part of the read-in or read-out. Table 2 gives the timing for the different 
FFT routines and for the Hartley transform. 


Table 2. FFT Timing in Millisecondst 


eer Radix-2 Radix-4 Radix-2 Radix-2 Hart! 
ale eum Complex Complex Real Real orey 
Size FFT FFT FFT Inverse FFT | ‘ransform 
64 


128 
256 


Timprovements have been made and are shown in this table. You may obtain the latest code from 
the BBS, (713) 274-2323. 
The last entry in this table represents the timing of the radix-2, DIT routine generated 
at the University of Erlangen [18] and given in Appendix A. These numbers are typically 
used for benchmarking. 


Implementation of Real FFT 


The development of FFT algorithms is centered mostly around the assumption that 
the input sequence consists of complex numbers (as does the output). This assumption 
guarantees the generality of the algorithm. However, in a large number of actual applica- 
tions, the input is a sequence of real numbers. If this condition is taken into consideration, 
additional computational savings can be achieved because the FFT of a real sequence 
demonstrates the following symmetries: Assuming that the FFT output X(k) is complex, 


X(k) = Rk) + j I(k) (7) 
and that the sequence has length N, R(k) and I(k) should satisfy the following relations: 
R(k) = R(N—k), k = 1, ..., M/2—-1 (8) 


I(k) = -I(N—-k), k = 1,..., N/2-1 (9) 
(0) = I(N/2) = 0. (10) 


In other words, the real part of the transform is symmetric around zero frequency, 
while the imaginary part is antisymmetric. Similar conditions hold if the transform is ex- 
pressed in terms of magnitude and phase. 


The savings are due to the fact that not all points need to be computed. Since the 
not-computed points do not need to be saved either, there are also storage savings. An 
efficient algorithm for real-valued FFTs is described in [10]. This algorithm was im- 
plemented in the present study in such a way that, given the sequence of N real numbers 
x(0), x(1), . . ..x(N-1), the resulting FFT, consisting of complex numbers, is stored as 
RO), RQ), . . .,R(N/2), I(N/2-1), I(N/2-2), . . .,J(1). R(k) and I(k) represent the real and 
imaginary parts of the complex number X(k). Figure 11 shows the memory arrangement 
for the FFT. Note that the input to the real FFT should be bit-reversed, but the bit rever- 
sal can be done as the data is brought in. With this arrangement, an N-point FFT uses 
exactly N memory locations. If the full array X(k) is needed, the following relations should 
be used: 


X(0) = R(O) (11) 
X(k) = Rik) + j Ik), K=1,..., N/2-1 (12) 
X(N/2) = R(N/2) (13) 
X(k) = R(N—-k) — j I(N—-k), k = N/2+1,..., N-1 (14) 


REVERSAL 


Figure 11. Memory Arrangement of a Real FFT. 


It is expected that, in most signal processing applications, there will be no need to 
reconstruct the full X(k) array and that the output shown in Figure 11 will be sufficient 
for any further processing. 3 


Appendix C contains TMS320C30 routines implementing a radix-2 real FFT and 
its inverse. The implementation of the forward transformation 1s based on the FORTRAN 
programs contained in [10]. The inverse transformation assumes that the input data are 
given in the order presented at the output of the forward transformation and produces a 
time signal in the proper order (i.e., bit-reversing takes place at the end of the program). 
Viewed another way, the inverse real FFT operates as shown in Figure 11 but with the 
arrows reversed (and inverse FFT taking the place of the FFT). 


The timing for the real-valued FFT (both forward and inverse) is included in Table 
2, and the corresponding program sizes are shown in Table 1. As you can see, the real- 
valued FFT 1s considerably faster than the corresponding complex FFT because not all 
the computations need be performed. Furthermore, there are data storage savings because 
only half the values must be stored. As a result, the maximum length of real-valued FFT 
that can be implemented on the TMS320C30 without using any external memory is 2048 
points. Of course, if all the values are needed, they can be recovered using the symmetry 
conditions mentioned earlier. To achieve the efficiencies of real FFT and not use any ex- 
tra memory locations during the computation, the decimation-in-time method is applied 
[10]. Decimation in time requires the bit-reversal operation in the forward transform to 
be performed at the beginning of the program rather than at the end. The reverse is true 
for bit-reversing in the inverse transform. 


The Discrete Hartley Transform 


Another transform that has attracted attention recently is the Discrete Hartley 
Transform (DHT)[11, 12]. The DHT is applicable to real-valued signals and is closely 
related to the real-valued FFT. Comparison of references [10] and [12] describing the 
implementation of the two algorithms on FORTRAN programs shows that their implemen- 
tation on the TMS320C30 should be similar. And indeed, this is the case. 


The DHT pair is defined for a real-valued sequence x(n), n = 0, . . .,.N—1, by 
the following equations: 


N-1 
Hk) = Ls xn) cas(2akn/N), k=0,..., N-1 (15) 
n=0 
N-1 
xm) = 1 DL H& cas(2rk n/N), k=0,..., N-1 (16) 
N k=0 


where cas(x) = cos(x) + sin(x). The DHT demonstrates a symmetry that is convenient 
for implementations: The same program can be used for both the forward and the inverse 
transforms, and the result is correct within a scale factor. Also, the real FFT and the DHT 
can be derived from each other [12]. 


A radix-2 Hartley transform was implemented on the TMS320C30, and the cor- 
responding code is included in Appendix D. This code follows the structure of the real 
FFT in Appendix C. Tables 1 and 2 show the program memory requirements and the 
timing for the execution of Hartley transforms of different sizes. The sine/cosine table 
sizes are the same as in the case of a real FFT. 


The Discrete Cosine Transform 


The Discrete Cosine Transform (DCT), since its introduction in 1974 [13], has gained 
popularity in speech and image processing applications because of its near-optimal behavior. 
This discussion is based on the paper by Lee [14]. The DCT code was developed and 
implemented by Paul Wilhelm of the University of Washington. 


If x(n), n=0, . . .,N-1 is a time-domain signal and X(k) is the corresponding DCT, 
x(n) and X(k) are related by the following equations: 


N-1 
x(k) = 2 Lek) x(n) cos (2k_+ 1)an (17) 
N n=0 2N 
N-1 
x(n) = Lo ek) Xk) cos (2k + lan (18) 
k=0 2N 
e(0) = 1/V2 (19) 
e(kk) = 1, fork #0 (20) 


Appendix E shows an implementation of the DCT based on the paper by Lee [14]. 
The appendix contains the algorithms for both the forward and the inverse transformations 
and an example of a table for a 16-point DCT. Note that, because of the structure of the 
algorithm, the cosine table needed contains actually the inverses of the cosines (within 
a scale factor), and it is not stored in the natural order. Instead, it is generated by the 
following C pseudocode: 


for (k=2, i=O; k=N/2; k* =2) 
for (j=k/2; j<N/2: j+ =k)f 
cos__table{i+ +] = 1/(2*cos{j*pi/[2*N))): 
cos__table[i+ +] = 1/(2*cos{(N-j)*pi/(2*N))): 


cos__table[N-2] 
cos__table[N-1 ] 


cos(pi/4); 
2/N; 


The last entry to the table is not part of the cosine itself; it is a constant that is used 
by the algorithm, and it is placed at the end of the cosine table for convenience. 


Table 3 shows the timing of the forward and inverse transforms for different transform 
lengths. The difference in the timing between the forward and the inverse transforms is 
due to the fact that more time was expended to optimize the performance of the inverse 
transform. Since four of the smallest butterflies were done simultaneously in the center 
program loop, the minimum permissible array size’ to be transformed is 8. _ 


Table 3. DCT Timing in Milliseconds 


Transform Forward Inverse 
Size Transform Transform 


Other Related Transforms 


In addition to the FFT types mentioned earlier (complex, real, decimation-in-time, 
decimation-in-frequency, etc.), newer forms of the FFT have been developed to reduce 
the computational load. One of the latest in the literature is the Split-Radix FFT. The Split- 
Radix FFT [16] has the lowest number of multiplies and adds of any known algorithm. 
It achieves this efficiency by combining certain radix-2 and radix-4 butterflies, but, as 
a result, the classical concept of FFT stages is lost. The new structure uses a rather 
complicated indexing scheme, which is the price paid for the reduced multiplies/adds. 
Since, on the TMS320C30, multiplies/adds are not more expensive computationally than 
any other operation, the indexing scheme wipes out the gains of the reduced arithmetic. 
Actually, an implementation of the split-radix FFT showed it to be slower than the radix-2 
FFT, one of the main reasons being that the block-repeat structure could no longer be 
used effectively. 


Very often, there is a question on what the different benchmark numbers mean. A 
useful comparison of execution times for different algorithms on different machines has 
been made [17]. Table 4 presents a small segment of the resulting information that is relevant 
to the present discussion: the timing in seconds for the radix-8, mix-radix, and split-radix 
algorithms that were implemented on various machines. Different operating systems and 
compilers have been used, as shown. The execution times of Table 4 should be compared 
with the 0.0010055 s that it takes to implement a 1024-point, radix-2, real FFT on a 
TMS320C30. As can be seen, the TMS320C30 compares favorably to all the other machines 
investigated. 


eee 


Table 4. Execution Times in Seconds for a 1024-Point Real FFT. The Numbers Should 
Be Compared with 0.001055 s of a 1024-Point Real FFT on the TMS320C30 


[Machine | Ra | Mix~radtx | Spitroci 


VAX 750 UNIX BSD4.2 £77 

VAX 750 UNIX BSD4.2 f77 —O 

VAX 750 UNIX BSD4.3 £77 

VAX 750 UNIX BSD4.3 f77 —O 

VAX 785 ULTRIX £77 

VAX 785 ULTRIX £77 —O 

VAX 785 VMS FOR/NOOPTM 

VAX 785 VMS FOR/OPTM 

VAX 8600 VMS FOR/OPTM 

MICROVAX VMS FOR/NOOPTM 
MICROVAX VMS FOR/OPTM . 
DEC-10 TOPS-10 FOR/NOOPTM 

DEC-10 TOPS-10 FOR/OPTM 

CDC 855 FTN5,OPT =0 

CDC 855 FTN5,OPT = 1 

CDC 855 FTN5,OPT =2 

CDC 855 FTN5,OPT =3 

SUN 3/50 UNIX BSD4.2 {77 -—O -f68881 
SUN 3/50 UNIX BSD4.2 77 -f68881 
SUN 3/50 UNIX BSD4.2 f77 —O 

SUN 3/50 UNIX BSD4.2 £77 

SUN 3/160 UNIX BSD4.2 £77 

SUN 3/160 UNIX BSD4.2 f77 —pfa 

SUN 3/260 UNIX BSD4.3 £77 

SUN 3/260 UNIX BSD4.3 £77 —O 
Pyramid 90X UNIX BSD4.2 £77 —O 
Pyramid 90X UNIX BSD4.2 £77 

HP-1000 21MX-E FTN7X 

Apple MAC Microsoft FOR 

AST PC Microsoft FOR 


The TMS320C30 C Compiler 


The C compiler for the TMS320C30 permits easy porting of high-level language 
programs to the DSP device. If the CPU loading of a particular application is not very 
high, the C compiler can create programs that run on the TMS320C30 in real time. If, 
however, the result is non-realtime, it may be necessary to use assembly language for 
more efficient coding. 


In most cases, only a portion of the code needs to be written in assembly language. 
Typically, there are a few code segments where the device spends most of the time and 
which, when optimized in assembly language, yield the necessary performance 
improvement. By following the conventions outlined in the run-time environment of the 
C compiler [15], you can write these time-critical routines in assembly language and call 
them in a C program. This is also true for the FFT routines. In appendices A, B, and 
C, the radix-2, radix-4, and real FFT routines mentioned earlier are also put in a C-callable 
form by adding the necessary interface at the beginning and the end of the code. The tables 
with the sines and cosines are again assumed to be supplied during link time. 


Issues in FFT Implementation 


There are many ways of actually implementing the FFT code (and the other 
transformations), taking into consideration the different possibilities of program locations, 
the data locations, the ways of input and output, etc. Since it is impractical to cover every 
possible case, this report has concentrated on a configuration in which the use of external 
memory is minimized. With the source code and additional explanations provided, you 
should be able to customize the FFT implementation for a particular application. 


Use of External Memory 


In these implementations, only on-chip memory was used, and that’s why the 
maximum transform size considered was 1024 points long (2048 for a real transform). 
Often, though, applications call for use of external memory for program or data or both. 
When external memory is used, the structure of the code does not change at all; it is only 
the timing that may be affected. 


Fast external memory can be selected so that no wait states are necessary. But even 
when there are no wait states, accessing external memory may impose some limitations. 
For instance, you can make only one external memory access ina full cycle, but you can 
make two accesses of internal memory in each cycle. Also, because of mutliplexing of 
the busses, pipeline conflicts may arise if both program and data are placed on the same 
external port. Resolution of such conflicts causes extra cycles for the execution. The section 
on pipelining in the 7MS320C30 User’s Guide explains in detail what kind of potential 
conflicts may occur. | 


To minimize or avoid such conflicts, there are some simple steps that the designer 
can take. The TMS320C30 has three separate memory areas (one on-chip, one accessed 
by the primary bus, and one accessed by the expansion bus) that can be combined. For 
instance, the program can be placed on the expansion port and the data on the primary 
port. Or the data can first be brought into internal memory and then operated upon. 
Alternatively, the program may be relocated to internal memory. A related approach is 
to use the cache. All the transforms are implemented as loops that are executed many 
times. If you activate the on-chip cache after the first access of the code, the instructions 
execute from the cache instead of the external memory. 


If there are additional conflicts, they can typically be resolved by some rearrangement 
of the code. For instance, consecutively writing to external memory takes two cycles per 
write. If, however, a write is followed by some internal operation, then the second cycle 
of the write is transparent, and the actual cost is one cycle. 


Bit Reversal 


The TMS320C30 has a special form of the indirect addressing mode for the bit- 
reversing operation that is required at the beginning or the end of an FFT. Through this 
addressing mode, the scrambled data are accessed in their proper order. This addressing 
mode works as follows: 


Let ARn (n=0..7) be the auxiliary register pointing to the array with scrambled 
data. The index register IRO contains a number equal to one-half the size of the FFT. 
Then, after every access of the data, ARn is incremented by IRO using the construct 


*ARn+ +(IRO)B 


This causes the contents of ARn to be incremented by the contents of IRO, but if 
there is a carry in this incrementing, the carry propagates to the right instead of to the 
left. The result is the generation of the addresses in a bit-reversed order. The bit-reversed 
addressing mode works correctly if the array with the data is aligned in memory so that 
the first memory address is a multiple of the FFT size. This can be achieved if the first 
memory address has zeros for the last M bits, where M = log2N, with N being the FFT 
size. For example, in the case of a 1024-point FFT, the last 10 bits of the memory address 
of the first datum should be zeros. | 


In the implementation of the complex FFT, the output is complex even when the 
input is real. So, there is a need to consider both the real and the imaginary parts of the 
data array. The above description of the bit-reversed addressing mode assumed that the 
real and the imaginary parts are stored as separate arrays in the memory. In this case, 
each of the arrays (real or imaginary parts) can be accessed as described. However, in 
most cases (including this report), the real and imaginary points alternate in the same array. 


In this arrangement, the following simple modification achieves the same goal: set IRO 
equal to N instead of N/2, and access the N points of the transform. At every access, the 
auxiliary register is pointing to the real part of the FFT. The imaginary part is located 
in the next higher location, and it can be easily accessed. 


With the bit-reversed addressing mode, the unscrambling of the data can take place 
when the FFT result is accessed for further processing or for I/O. It is possible, though, 
that certain applications demand the reordering of the data in the same array. Such a 
rearrangement can be done very simply for a complex FFT with the following code. 


; DO THE BIT-REVERSING EXPLICITLY 


LDI @FFTSIZ,RC ; RC = FFT SIZE 

SUB! 1,RC ; RC SHOULD BE ONE LESS THAN DESIRED # 
LDI @FFTSIZ,IRO  ; IRO = FFT SIZE 

LDI = @INPUT,ARO 

LDI = @INPUT,AR1 


RPTB BITRV 
CMP! AR1,ARO ; EXCHANGE LOCATIONS ONLY 
BGE CONT IF AROAR' . 
LDF *ARO,RO 
| LDF *AR1,R1 EXCHANGE REAL PARTS 
STF RO,*AR1 
| STF R1,*ARO 
LDF *+ARO0,RO ; 
|| LDF *+AR1,R1 ; EXCHANGE IMAGINARY PARTS 


STF RO,*+AR1 
| STF R1,*+ARO . 
CONT NOP *ARO+ +(2] 
BITRV NOP *AR1 + +(IROJB 


Note that AR1 is pointing to the bit-reversed version of the address contained in 
ARO. For real-valued FFT, or for FFTs that store the real and the imaginary parts in 
separate arrays, the real-FFT routine in Appendix C contains a modified example of the 
above code. 


Use of DMA 


If the signal to be transformed arrives as a continuous stream of data, the DMA 
could be used to collect the new data while the data already collected are processed. In 
this case, the data source address of the DMA points to the memory location correspond- 
ing to a serial port, or to another port associated with an external device. The destination 
is a memory space designated for storage. 


~~ a 


There are two ways to use such buffers. One possibility is to designate one buffer 
as the temporary storage and the other buffer as the working area. When the storage buffer 
receives the necessary amount of data, the data is transferred to the working area, and 
the DMA starts refilling the storage buffer. Alternatively, the two buffers are considered 
equivalent: when the processor finishes processing and outputting the data from one and 
the DMA has filled the other, the two buffers switch functions; i.e., the DMA starts filling 
the first buffer while the CPU is processing the data in the buffer just filled. 


Test Vector 


For testing purposes, a vector with 64 (quasi-random) data points and the 
corresponding FFT values is given in Appendix F. In this way, if any of the routines is 
implemented, the test vectors can be used to verify the correct functionality of the routines. 
Together with the test vectors, Appendix C gives a sine/cosine table for a 64-point 
transform, and the linking file for such a transform. 


Summary 


This report examined implementations of fast transforms on the Texas Instruments 
TMS320C3x floating-point devices. The transforms considered were several forms of the 
FFT, the Discrete Hartley Transform, and the Discrete Cosine Transform. Because of 
the powerful architecture of the device, the implementation was done easily and efficiently. 
It was shown that a TMS320C30 executes the FFTs several times faster than large computers 
such as VAX and SUN workstations. With the availability of the C compiler, these routines 
can be put in C-callable form and be used to compute the corresponding transforms 
efficiently. 


Appendices 


Appendices A to F contain the TMS320C30 assembly language programs for the 
different algorithms considered. The contents of the appendices are as follows: 


Appendix A: Radix-2 Complex FFT. 
composed of 
Al: Generic Program to Do a Looped-Code Radix-2 FFT 
Computation on the TMS320C30. 
A2: fft_2 - Radix-2 Complex FFT to Be Called as a C 
Function. 
A3: Complex, Radix-2 DIT FFT - R2DIT.ASM. 
A4: Complex, Radix-2 DIT FFT - R2DITB.ASM. 
AS: TWID1KBR.ASM - Table with Twiddle Factors for a FFT 
up to a Length of 1024 Complex Points. 


Appendix B: Radix-4 Complex FFT. 
composed of 
Bl: Generic Program to Do a Looped-Code Radix-4 FFT on the 
TMS320C30. 
B2: fft_4 - Radix-4 Complex FFT to Be Called as a C 
Function. 


Appendix C: Radix-2 Real FFT. 
composed of 
Ci: Generic Program to Do a Radix-2 Real FFT Computation 
on the TMS320C30. | 
C2: fft_+l - Radix-2 Real FFT to Be Called as a C Function. 
C3: Generic Program to Do a Radix-2 Real Inverse FFT 
Computation on the TMS320C30. 


Appendix D: Discrete Hartley Transform. 
composed of 
D1: Generic Program to Do a Radix-2 Hartley Transform on the 
TMS320C30. 


Appendix E: Discrete Cosine Transform. 
composed of 
El: A Fast Cosine Transform. 
E2: A Fast Cosine Transform (Inverse Transform). 
E3: FCT Cosine Tables File. 
E4: Data File. 


Appendix F: Test Vectors, 64-Point Sine Table, Link Command File. 
composed of 
Fl: Example of a 64-Point Vector to Test the FFT Routines. 
F2: File to Be Linked with the Source Code for a 64-Point, 
Radix-4 FFT. 
F3: Link Command File. 


The first three appendices contain the code for the radix-2, complex radix-4, and 
real radix-2 FFT transformations. These routines are given in both the regular form and 
ina C-callable form. Furthermore, the contents of a file with the twiddle factors are given, 
as well as an example of a link command file for a 64-point FFT. Note that the source 
code of these routines can be downloaded from the TI DSP bulletin board (BBS) by calling 
(713) 274-2323. For questions regarding the BBS, call the TI DSP hotline at (7 13) 274-2320. 
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Appendix A. Radix-2 Complex FFT 


Appendix Al. Generic Program to Do a Looped-Code Radix-2 FFT 
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FFT up to a Length of 1024 Complex Points. 
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Appendix B. Radix-4 Complex FFT 


Appendix B1. Generic Program to Do a Looped-Code Radix-4 FFT on 
the TMS320C30 
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Appendix B2. fft_4—Radix-4 Complex FFT to Be Called as a C 


Function 


d001 


1SMI4 30 MBLNOO 10334 SZIWILINI * 


JIBYL SOO/NIS M04 MBINIOd “/NETYT * 
(QUI /W38 40 3S0WI38) TNeZ=ONr * 


SRN 


JOVIS INGYUND FHL SOWH BwWLSa * 


LV1 ‘da “WOUOONd NI X3ONI IYI * 
LIT ‘d “WRSOONd NI MBINNOD UF * 
INN00 dOOHINODSS * 
WISOO/INIS 404 XSONI 3 * 
UGLNNOD 1Y3d3Y ¢ 

® WwIS 144 ¢ 


WsO0Nd 3HL NI SSRN FHI * 
ONIHOLWW SNOIIYI07 OL SINBMNOW SNOW * 


SUB1SIO3Y CALYIIG3d SAWS * 


JV WIS 30 Sssuad ¢ 
NOILNO3X3 WOJ INIOd AMING * 


aye: Wee 11S 
Lee‘ Im 
Ta] ‘Z- HS1 
OUI 'T HS 
* 
vise’ Ly Hs 
Le‘ 107 
IMI‘ 71S13#@ Im 
OUI *21S1448 im 
ou'21S149@ 7 
+ 
eet) | ssa" 
tur sss* 
1 IN0dT ssa’ 
1 *XONT3I ssa’ 
TINQ1dy ssa* 
T“2OWLS ssa’ 
* 
QNILMOM 144 371IWILINI = * 
# 
INGNI@ ‘OU 11S 
ou‘ (¥)d4-4 m 
1449079 ‘04 11s 
ou (E)dd-# Im 
21$1450 OY 11S 
ou‘ (Z) dd m 
5 
:) HSNd 
oY HSNd 
Gud HSfd 
Pe] HSNd 
uu HSN 
oy HSNd 
oy HSNd 
vy HSNd 
di‘dS 101 
dj HSNd te 3447 
* 
NOILINNS 2 3ZIWILINI = # 
+ 
WIS” pon = S¥LNIS 
ta 
x3L" 
2 
1" LMdNI ssa’ 
1°143907 ssa° 
VZ1SL44 ssa" 
Sd 
Wiss | =ROW 
rise = HOD 
# 
cw 438° dj 


- 


AHAB HE HEHEHE HHH HEH HHH 280988 8 EH 


L861 ‘€1 4380190 SINGWLSNI SYX3L 


SIWHOIWdYd °3 SONYd SUDHLIW 


su ‘su ‘Su ‘Tul ‘oul ‘lay “9am “cu 
‘pay ‘coy ‘zuy ‘tay ‘ow ‘zu ‘9u ‘Gu ‘eu ‘ed ‘Zy ‘Te ‘OM :035N SY¥z1S193u 


$---——-—------+ 
H digo : (O0)dd- 
: MON ML (T)ds- 
H N H (Z)ds- 
ul ' (€)dd- 
¥L0d : (¥)d4- 
$-----~~-------4+ 


2TW SH NO) SUALINWLS AVIS 


*((3S0dNIUSdNS) TTAVTIVAY 

QW OOIN3d 3NISOO ONY SNIS TW UY “AUN SIHL NI “3AUM SNIS 3H 40 
QOINGd UaLWUND Y ONY TVS & Y0d SSTWA 9/NAN BMY SUGHL “144 INIOG-N 
NY OS “SATWA 3AMM GNIS FHL SAY “°913 ‘ZANLRA “TANIPA SOWA FHL 


(N/TOSZ#( T-P/NEG) UTS = (P/NG)AMIEA = —- ROIS" 


(N/TdaZapjuts = Zanpea = yRO{ 4" 


(N/tdaZaQ)UIS = Jan{RA = yROLG” auts~ 
Bytp° 
auts~ —- | 806° 


sUWY03 ONIMOTIOS 3Hi SAH CTINOHS 1] ONY ‘SWIL NIT ONTUNG 

O31 WddNS 34 OL 03L93dx9 SI SMOLIV4 JWOIML FHL HOS TIAVL INISOO/3INIS 
HL “190 (31N30000 3a NY) Lud SIHL “AWWSS303N LON SI SIHL 4] 
“NOLLOINNS 3HL 30 ONG 3HL 1 O3INSM WWI SI WWSUGARY 11a “C3A0ULS3I 
ST VINO "WNIDIYO 3HL ONY “SOU Td-NI 3NO0 SI NOILVINGNOO FHI “LIT °d 
NO WYYOOUd SHL HLIM SNTUdWOO N3HM SONSY3I4I0 SIH 3LON "3JOVMOLS 
QNIUNG CONVHOMGINI SMU ATSHSLING #-X10WY 3H1 40 SSHONME JNCIW 
OL 3HL ‘MUNDO (3SU3ARN-LIG NI LWNS3Y WNIS 3HL 3AWH OL Y30N0 NI 


"LU *d “O08 Std ONY 
SOME HL NI WOMDONd NYYLYO4 3HL NO CSSVE SI WYYOONd SHL “ONT LIN 
“M317 SSNWA AMUNIOWW] ONY WSN HLIM “ONOI-NEZ SI AVUMY LUO SHL 
“OEIOZESML SHL NO NOLIVLNGWOO 14d #-XIGWY ¥ 00 OL NOILINNS 3143N39 


2NO11d140S30 
VLWO LNdLNO ONY LAGNI HLIM AVUuY = RRP POLY 
(N)¥901 = S3ON1S 30 UGSMIN Wout 
WHEN 3321S 14j Nut 
(wiNd “WoCN)OT39¥ 0 qut 

2S 1SAONAS 


“NOLLONNS 3. ¥ SY CSTWO 34 OL 144 ATMO $-XIOWY —— 97395 SSRN 


4 78 X1ON3SddY 


 _e eee ee we eee ee ee ee ee ee ee 


ZISREN+ZOH TED * 
ZIS#eu=9Y * 
ZIS#IM-ZOOHEME (TIA ¢ 
ZOO#TMELY 
(E1)A-(TT)A=GY Ff 
ZIS#TY-Z00#EMEIY * 
Gu TMe (1X: 
ZIS#IMELY * 

GU TueTy * 
(Z7)X-(1)X=7y ¢ 
GU TMeZY t 
GHEM=(T)A * 
 wgOee=sy + 
(ELIX+(T1)X=Gy * 
(ZIDX+C1)NETY * 
GU-Eu=eY 
(ZT)A-CD)Azee 
GHENEDY ¢ 
(ELVAS(TT ASCH: 
(ZIVAF(TAZEY * 


T-1¥l+2vI=eyl * 
T-Twletyl=zy1 ¢ 


wu X30NI 3NISOO 3uy3u0 : 


AIABLIG W1I3dS 


01 09 ‘IT=iN0d1 41! 
# (3YIS30 NYHL SS371 3NO 38 OINOHS Jy 


MAINIOd (IEA (EDN) ¢ 
MBINIOd ((ZDVAZDN) & 


WAINIOd (CIDA (TED § 


MBINIOd CCIDAS CID ¢ 
3I+IVI=Te1 * 


YGUNNOD dOO7 YSNNI INGMSYONT : 
d007 YSNNI 804 SGINNOD dOOT LINI : 


X30M1 WI LINI * 


9d‘ LY 
9y ‘Guys “Cy 
Tapa $y 


2a‘ (181) Gupee ‘Ty 


Cu Tate ‘Suse 
9M LY 

(OMT) 440m ‘74 
LY Sue TY 
Ty‘cy 

ZU OUR Zu 
da‘ta’sy 

Ouv+s “FY 


ou‘ (THT) Subs ‘fy 


Gu Tate Coe 
Ta Ouse ‘Zu 
ey'cu 

9‘ Old+8 ‘Zud+e 
9u ‘eu ‘su 

Gu a+ # “Cues 
CU‘ OuN +e ‘ZU +e 
rah :| 


Su “1 

Quy Guy * ZY 
Suy‘T 
Guy Luy ‘Puy 
yyy GULNIS@ 
wu ‘Tula 
La‘ TWwia 
TOs 

guy‘ ire 
Jy‘T 

3° INL 
fun ‘Zuy ‘Ou 
Tay‘ Tuy ‘ou 
LNOd ‘9a 
Tay ‘ou ‘ou 
wla' 2a 
OW‘ LRANT@ 
6‘ XONIBI@ 
cae‘ ta 
Ody * IND Te 
9d * INDd Te 
Say ‘Z 


INOd 7‘ Le 
LN '2 

ot Wee 
at 


:d0 WI 


007 SGNNI NI ot 


ZWIS 14d INDO * 


V1Sa* LY 
ana 

0° 1459078 
Lae} 

Lay“ 3OW1Se 


3NO] SW MOA “3OV1S 1SU1 FHL SI SIMI JI 


vei SMbZTMR(EDDX | 
TR? i040) 
iit CbTMETY f 
ity Cd-TaeGy | 
QUHPHE(EDDA | 
9U-P=(ZT)A * 
9ueeuery 
9u-puesy * 

CM TME( TIX & 

rit (ETASCIIDAZE- * 
(E1)X-( 1D) X=9y * 
Cu-TueTy 
Cote 1X § 
(ZEX-(1)XeZy + 
CH TuOYy * 
Ce-TMS(TIDA * 
(ZIIXF(D)XETY ¢ 
(EL)X+(T I) X=eu * 
(TL)ASZa ¢ 
(Z1)X=Gu + 

CW TeETY * 
CMHT=(T)A * 
(ZI)A-CL)ASbY * 
CorIuEoY | 
(ELAS(TTDAZEM * 
(ZEDAt(LIAZTY * 


@ (3YIS30 NYHL SS37 IND 38 ONNOKS y: 
(E1)X OL SINIOd Exe * 
(Z1)X OL SINIOd Zay * 


(T1)X OL SUNIOd Tay * 
(1)X OL SINIOd OwW * 


ae" * 
ZeZ/ONE IS 


XBONT 31 S21WILINI * 


(OUl)++uWs ‘74 
(O41 )+4+Z7uue ‘SY 
ras es] 

cu ‘zu ‘ey 

Cue ‘ey 

Zu ‘CY 
9u'9Y 

cu‘eu'ou 
(OUT) +41 ue ‘TY 
cu‘ eures ‘74 
94" Tae ‘Cue 
ra‘ 

(OUT) ++0ue “9Y 
Zu‘ Oude Gy 

ou‘ ty‘ey 

Taos “Ty 

Tu" oude “Sy 

Cu’ lade ‘cups 
08 Tees 
Gu‘ Z ua 

Ta‘ey 

Oude “9y 
OURS ‘Zee 
ou‘ ty‘ ew 

U' Tuee ‘eue 
Ta Zaye ‘Ounee 
Da: 


wt 
Jy" INDIO 
euy ‘7M ‘OU 
Qn Ta ‘ou 
Tay ‘ouy ‘Ou 
Oud‘ LNGNI@ 


ou't 

ou'Z 
1ra‘ou 
ou'Z 
XONI3I0' LY 
ou 'Z- 


Als 
Als 
a00v 
JaNS 
Als 
Als 
ay 
Jans 
Als 


Boy see bee ees 


% 
4 
# 


Dm 


MINLIW GY SINWA YBISISI FHL 3901S3 


Ned 40 321S30uI * 
@ C3MIS3O MHL SST] 3N0 38 TTOHS 3y : 
N=du é 


3wiS 143 LONE 
¥/TEIN Ft 


ZeU/Mees * 


ZNEIN * 


31e9=31 * 


MIL 


4X3 M04 MBLNN0D 1Y3d3e LNSWSNONE ¢ 


3007 UBNNIT 3H1 01 XOvd d007 * 


vir [20H ZMPUIECEDX * 
tar TZOOR(ZTHPH)-=(ED)A ¢ 
vie 1200eTM=zH F 

tae WeZMRTY F 
TZODH(GUEEMD=(Z1N ¢ 
TZOOHTU=TY 

vit PUcMeTY F 
TZQQ#(GU-EM)=(ZIDA * 
WooweueeN ¢ 

Gueuecy * 

IZ02eTUETY * 

Gu-edety | 


§ (OU) ++ Tues 

(7) OU+ +4 
(TOmrEE TY 
(1) tapes “OW 
Ta" (1) Ladee 


w'71S1350 


Our “Ou 
XONI3T@‘ 9UN 
TZ 


Lav'Z 
guy‘ XONT31@ 
LUN LINDL 


dW! 
OU" LNId7@ 


(OUL)++Eud 78 
Cuvee ‘TY 

ZU ee 

Zu ee 


INdiNO 3H1 40 ONISYZASH-LIG FH1 00 


+ 

+ 
AgLIE 
38) 


Ed 


ee. 


iNOJ 


Cemex t 
TREGHEMTTDA & 
(E1)A-(TI AEM * 
(EL)X-CTEN2Ty * 
Cote (1x 
GUOEME(T)A * 
Gueeuety ¢ 
GQeuedy f 
(EDDASCTIDAZGH & 
Qaeluety f 
1e-Gu=9y ¢ 
(EL) Xe( TL) X=Gu ¢ 
(ZI)A-(D)A=ee * 
(ZT)AS(LAZEM & 
(ZT)X-(1) 27a * 
(ZT) X4(1)X=T8 * 


1Z00=¢4Y X3ONI 3NISOD aLy3H * 
(SP)NIS OL INIOd ¢ 


2007 H3NNI FHL 01 ONE d0OT * 


CISHRHEOONZUE (EL) * 
CISHY+ENIeTU=IN * 
CISeedecy * 
CISHZTH-EDIMPUE(EL A | 


LISSEM+1OOHIBE(ZIX | 
cooeeu=9y 
TIS#EM+ 1008 TYE9Y ¢ 
TIS#eyecy * 
TIS#TY-TOHEM=(ZIDA ¢ 
TooHTe=sy § 
T1S#tu-1ooeeu=su ¢ 
TS#lyeze * 
ZIS#EMSZODe THE TIIX * 


 Gu-tusey + 


eu‘ Ty 

cu'eu ty 

ma‘ eu'zy 
cu‘zy‘ey 
(OMl) ++ Tape ‘LY 
Tees “OY 

Cu‘ TUH# ‘Cue 
Tu lage exes 
(OUl)++0uve TY 
Oude “C4 

cu'cu 

La‘ea'su 

Cu‘ Tas ‘ Cuiho# 
ie.) 

gu'Gu' ty 

Gu‘ Tube cue 
#4‘ Oude ‘Tuts 


Po Od ATSYSLLNG W1I3dS 


Int} 
dW! 
Ou" INOS T@ 


(OWL) ++EuR OY 
9u' 14 


cue 94 

gu (TUL) Sues ZY 
9u' Ls 

La‘ouge' 7 
(OUl)++ZeRHe “OY 
ey' (LUT) 9a 9 
guy 


OM! TUT ue TY 
qu ts 

La‘ pune Te 
(OUT) ++ Tae “SY 
gu‘ (TUL) eee EY 
vy‘cy 


PEELE EELS Pb Eee 


G1dd 


Idd) 


boas 


ssashbaas g 


Appendix C.Radix-2 Real FFT 


Computation on the TMS320C30 


Appendix C1. Generic Program to Do a Radix-2 Real FFT 
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Appendix C2. fft__rl—Radix-2 Real FFT to Be Called as a C Function 


@ G3MIS30 NHL SSJ7 INO 38 GOHS Wy: wu‘ 1ans tet IIIS rete t  TETrre ver rereee rere ee Ses 1 


S3WIL Z/N 1Y3d3u * 34‘OUl Im * 

(1)X OL SINIOd OW ¢ OuY  LNNT@ im L861 ‘ET 4380190 SINBWMISNI SUX3L * 

# SITWHUIWdd °3 SONYd :yQHINY oe 

S3ITRGLIN OML-HLONT] ot * 

* RY SY TU ' 

(ON) +4 Tuy ON AMLIG “OUT “Gay “way ‘Zay ‘Tuy ‘OMY ‘GH ‘bu “eu ‘zy ‘IY ‘OM TO35N SYaISIOgy oe 

+00 d0N ined : ‘ 

Oude TY dls re $-------------- + + 

Tags ‘0 dts : dimw 3 (0) d3- # 

Ty" Tate mm . | MOO¥ NUNLSY ! | ()dd- # 

Ou Out am : N $f  (2)dd- + 

TADOWW dT ¢ INO 34 ; : Woof EN de + 

AQNO SNOILYO01 3ONHOX § Oud‘ Tuy 1G a) vied of — (b)d3- # 
AMLIG Gidy $--------------4 * 

: 217) SHL NOdN SUNLINNLS yOWIS # 

Tay“ INGNT@ im + 

Ouy* LNANT@ Im "001U3d 3NISOO * 

Z/W=144d 3D 371S FHL SHEOUL ¢ OUI ‘T- HS7 FHL 30 MALMUD ISUI4 SHL BUY (Z/N)AN( EA OL (T+9/N)ANLEA PONY DOIYGd + 
Ou ‘ZIS1I3@ im JNIS FHL 30 MBLMUNO LSHT4 SH SW (H/N)ANLPA OL TaNTeA SIN WA FHL * 

# (3YIS30 NWHL SS371 3NO 38 (TOHS 24 wt Tans # 
N=OY : a‘ 2ISL440 Im (N/14aZ8(—/N) ) 509 = (Z/N)AN[RA Yeo ys’ a 

Te Shee NCE el 7 

ONINNIO3G JH LY ONISUSARY L1G FHL OG (N/TdaZHT)UIS = ZaM[RA yROL¥: * 

e (N/TORZHQ)UTS = faNjea Rots’ auts- + 

Lndnla ‘ou 11S byep: * 

04 (W)dd-# im outs” — jeqoy6: , 

143907 ‘ou Ls + 

= ON (E)ds4 Im > LUMO SNIMOTIOS 3H1 3AVH (OHS 1] ONY ‘SHIL ONIT ONIUND O317duNS + 
WAUOONd SHI NI SBN HL: 11S144 ‘Ou us 38 OL T3LI3dx3 SI SMOLIVS FWOINL FHL HOS TWWL ISOI/INIS 3H * 
ONTHOLWW SNOIIYI07 01 SINBMIOWY ahoW | OY (Z) dee Im . 
: "10 CBINSMMOD 34 NYO Lud STHL *AuYSS309N * 

ow HSNd LON SI SIM 4] “NOLIN 3H1 30 ONINNID3G [HL 1 OSINBWS TB! * 

wy) HSNd SI WSYSAR LIG “C3ACHLS3O SI LY TWNIDIUO 3HL ONY “SOYTUNI # 

cy HSfid 3NO0 ST NOLLULNGNOD SHI “dSSY NO “SNUML 40 OSS] 2861 QW OW 13 + 

+ HSNd NGSNSU0S Ad USd¥d 3H1 NI WYUS0Nd NYULYO4 3H. NO C3SUG SI WYMOQNd 3H! + 

di‘dS Im ey wht + 

SUBISI93N GaL¥91030 saws * dj HSNd  Wo4s7 CDT Oo Ct-Z/n Zsa Oy “(Oe sMOTIOS , 
: SUT ONY 8 SINIOd AMUNISUMI ONY W3H HLIM SNOIIVI07 SWS FHI NI + 

NOILINNS 2 IZIWILINI Q3MO1S SI LAdLAO SHL “WLYO WW3Y ATWO HLIM “SNOTEN SI ANY YLU0 SHI + 

‘ “OEDOZESML SHI NO NOTIYIAGMOO 144 2-X10WY ¥ 00 01 NOTLINNY 314GN30 a 

Wis” Pyom’ = UNIS SNOLIdINISRI ot 

ry # 

val" WLUC LNGLAO ONY LONI HLIM AWAY =| B2PDe 3ROTY : 

(N)Z907 = S30W1S 40 YOGMNN Wout . 

TINGNI sg‘ WHEZ=N 5971S Lid WN Qu Py 

1149907 ssa’ (yep “W “WET a49 qut Sd 

T'ZISL44 ssa’ *SISSONAS —# 

' 4 

FWY WIS 4 SSJuOW : WIS” wy: “NOTLINNS 2 ¥ SY GATIWO 36 OL 14d W3Y Z-X10MY --- 197354 a 
NOILMO3X3 YO4 INIOd AMIN ! wus wr at : 
ey 13s" ‘a 22 XION3d ss # 

% 


tl 


NENLLSY GY SIIWA WILSIOTY FH1 JVOLSHW 


d007 YBNNI SHI OL OWE g00T * 


IN+T=Cuy ¢ 


rere abe) CCA) a 

THO TIIMECTEN § 

THUD X=Ta ¢ 

~ ton OMF(ZINzCHIIN § 

TH (TT) X=Ty ¢ 

tid OMHIZIDN-S(EDN * 

ion OMF(ZDX=TY ¢ 

iis OOH(ZIDX-2Ty ¢ 

iit SOO#(PIDXANISH(E])X-=04 * 


NIS#(€1)X=08 


NIS#(P1)X+SO0N(E1)X=7y * 
Soda (el )x=ty * 
NISH(HI)XeTy * 
SOO#(E1)X=04 * 


SWIL 1-9N 1¥3d3u § 


STAL SOO/NIS NI3ML3G NOILYWNd3S=TYI * 


(ZN+PN+ 1) X-=(ZNHONET)X * 
(ZNOD)X-CL=(ZMHADN F 
(ZMOPNeT)X-=TY * 


dj 
vy 
) 
+e 
Qn 


4007 ca 

Gu‘ 1459078 Idd) 
Gut 10 

dON 

ON 


Guy * LNGNI@ Tov 


awl gid 


Guy’ 71S1450 Idd) 


SUV ‘PY 1000 


GUY * LINI@ Tans 


—Zuye ‘Ty tS 
++ Toate ‘TY AIS 
Ty Tee ‘ZY 3S 
—pode TY dS 
Tu'7uS Tage 300y 
+efue ‘TY 4S 
Tu‘ou Zaye 00 
Ta ‘ou'z7uye 3ans 
ou’ tu‘ou INS 
* ou (OML)++0uNNs 'EuYs dAda 
za‘ tu‘ow 00 
Ty’ (TUL Oude ‘ups dA 
Tubes ‘pute Sh 
Ou (TUT) Ob ‘Eau ddl 
ond @ldd 
3u'Z 1ans 
ou‘ ey 107 
Tat ‘2- HS1 
WHI ‘Z1S1440 m 
4001 LSOMU3NNI 
Gave ‘TY dS 


Sue “OY Als 
Ta! (TST) Sate +e SON 


SEES 


(ZMODN-CL)Xe0m § 
(ZMOLIXOCLIKECDN F 
(ZOD) N4 CL) X-20y ! 
(ZOD KOCEKeTy & 
(L)Xa0u § 


(TMeP-1)X=(P1)X OL SINTOd vay * 
(ZH -T)K2(Z1)X QL SINTOd TY * 


(QHCOLILS(ETX OL SINIOd CMY * 


(PoT)X2( TIX OL SINIOd Tay * 


we Tel ¢ 
TIAVL SOO/NIS OL SINIOd OWW * 


(LX OL SINIOd Gay * 
(WAHOONd JHL NI d007 02-00) d007 YGNNI 


UNEZ=ZN 
WEZ=ON * 
2/3 
meey * 
pepe | 
MIGMNN 3OVIS INGUUTD FHL SOWH Se: 
2 404 X30NI=0Ur § 


ou 
(TUL) Guay 

OU* (THI) Guete+e ‘0M 
Ta‘ouS (Tal Cuy+s 
Ou’ (TUT) +9Cuys 


ud ‘Zu ‘EN 
aw'z 

Za euY 

> 
cay‘ Tay 
Tw'T 

Tey ‘ou 


TU “ve 
Oud “EY LNIS@ 
Oud “OU! 

GUY "LN GNI@ 


oul 2- 
oul ‘71S1350 


(S30W1S 133) DOT NIW 


(€4])X-=(E4T)X * Oud+# “OY 
(ZeD)-(1)Ne(Z4 1X (OUT) +0 ‘TY 
(Z4L)X4(D)X=CDX * (OUT) OWW-# OY 
(€41)X-=0u ¢ ou ‘Oud 

(Ze) X-CIN=Te § aS COUT) ONd—# Out 

(ZT) Xe) X08 | OM (ONT) ++0uNe (OUT OU 
ra | 

# OIVISID NHL SSI INO 34 OWOHS wt 
SBMIL ¥/N 1¥3d3e * w‘2- 
‘ZISLI30 

ZNeZ=0U1 * our ‘Zz 

(1)X OL SINIOd Ow * uy’ LNGNT@ 


ils 
dis 
ils 
DW 
3S 
30 
ald’ 


1ans 
#S1 
Im 
Im 
Im 


(d001 O1-O0 NI Z=4 30W1S) JOO] 02-00 SHI 40 SGUd SHIA ot 


(MeL X-CL) 2TH * 
(Tel xe(DxeCDX & 
(WoT XC x=ta & 
(Tel X4(D)X508 § 


ils 
Als 
3S 
30 
€1dd 


Appendix C3. Generic Program to Do a Radix-2 Real Inverse FFT 
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Appendix D. Discrete Hartley Transform 


Appendix D1. Generic Program to Do a Radix-2 Hartley Transform 
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Appendix E. Discrete Cosine Transform 


Appendix El. A Fast Cosine Transform 


“d007 JOV1S ATSUILING WNIA 


“S31UBS dO0O7 1SUI4 30 ON3 


“YOO LW3d3Y 40d HRINNOO 13d3y 135 * yu‘ Guy ‘oul €100¥ 
“SUUINIOd = * 

Wi¥0 ONIZIWILINISY SANILNOD * uy ‘guy ‘Ou! €1@Ns 

*Z AG MBINNOO UIMOd S.Z AIL Ww: Suu ‘T HS 

“LON 41 ‘HONVME AYO ¢ dOOT 3OIS1NO 194 

CILIWMOO SIMI ATHMILING LSUI4 SI: Tur ‘Z Ia 

Tuy Tur 10d 

Tay 9Y ‘OU! 100v 

“SUGINIOd U1V0 3ZIWILINIZY * Tay ‘oY Im 

(2 AG SOIAIO) “WALSIOSY XS0NI JWddn * Tal ‘T- HS 


“MOOT SIQCIW 01 SU3H WON HONKUE AV TSO 


(0009 THIS sa * 
SSSMOUY ONG GW LUWIS AGHA @idy‘ 
ONISN NYHL MGLSY4) “SOON L¥3d3y 13S ¢ 1S ‘HOOTO 40 


*MGINIOd 1N3I9144300 3NISOO Uwddn : LN‘ 1Gqv 
"1WadsN 
1X3N YOJ SUBLNIOd OF TYNIA SbddD * «9 (OY‘--buee ++ Tue £300 
“LON 4] “HONE C3ANT30 * dOOT S10 IW aa 
CLT WMO NEG S3I-AGLING SAH * Zu EU 1a 
“SUBINIOd VINO 3LW0d) * = OU‘ Zeta‘ +E Ue £300v 
"lvade* 
YOOTW LGN 404 YGINNOD 1¥3d3H Juba | Ju ‘cay ‘ou! claw 


“S3IMIS JOO LSYI4 30 007 YSN 40 ONG 


“AISLING IST Y3dd NI t 
NOILIQGY 1ST SWS “ATSUALING GZ: LUTUL) ++ Tage 7 dis 
30 SWH NBNOT NI ATWdIL Vid IST BAS : LCTML) +e ue ‘Oy ds 


*ATSMALLNG ONZ MSdd NI NOILIOQy =: 


OR SAWS “ALAGLING 41 SWH OC LCTUL) ++pude ‘ey dts 


es oe se 


ne eh oo om 


ste ee 


* 
*d00T 3.1N30" ONG 
* 


MBMOT NE LWNS3Y ATdIi IW ONZ 3AuS : TTYL) +4Zuye TY iS 
“WlNd AIRBLINAG = * # 
ISHI4 OW '1NG19145300 BNISDOé 7a‘ Taye ‘Ze €300¥ : 
Ad LWS3Y NOILWAUANS LST ATdI LW ¢ Ou' Lud-—+ ‘OY E4Ade 
"W100 ATSMILING = ¢ # 
QN003S GY *1NG19145300 ISCO‘ Cu‘ wuts ‘ey €400v 
Ad LWS3Y NOILWMLONS ONZ ATL TH : Ta‘ Looe Ty €4AdM 
“VLYO ATAALLNG ISHT4 LOMMLANS : Ou tus ‘Zee €4ans 
“YN ATSYALINA ONDO3S iwuLEns : Tapes Cue £35 
(Yaiv1 SOMMEO ¢ + 
ET RUd SHON 404 SMOTW SIH): cu eave m 
"ATRALLN HONS 30 JH YSNOT 130 * Zu De am 
? 
‘MIL saws Bu! :d00T JOIN 
iv CALVO W) Sey SIT TAGLINE On ¢ :d00T 30ISiNO 


JOO T M31N30-ON3 ald’ 


_ 


St a te 


# 
2103 
* 


SoD 


# 


*3NO NIG FHL 1d30X9 S3OWLS ATSUGLING 3HL TW S300 S3IN3S 400) SIHL 
$31u3S dOO1 1SYI3 
Nee | 
LNN00 NYHL SS31 3N0 38 ONOHS Jy : yu ‘Guy ‘OUI €1aav 
pay “euy ‘OUT e1Qqy 
“NOLAVZITWILINI MGLNIOd YLVO HSINIY * cuy “9uy logy 
"MGLNNOD MMOd S.Z 3ZIWILINI * cud ‘T 1m 
uy “Oud ‘OY! €HS1 
TY ‘I 1@ns 
ZaY Oud Say elaqv 
“SMAINIOd WLU SZIWILINI $ Tuy ‘Say tm 
*S31N3S AIRALING = * Our ‘I- Im 
ASUIS Yd SHBISIO3N X30NI SZIWILINI * Tar ‘ou 1m 
“WGINIOd JI@VL 3NISOO avOT ‘ Lae‘soo-e Im 
"MGINIOd YIN GvOT ‘ 9uy “WLYd@ Im 
“ONISSSUON ¢ 

WUD M04 3ZIS OTE 135 : ya ‘37151930 Im 
“HLONTT YL avOT ‘ Ouy'371S1040 Im 

33300 puor* vivo- 

av17S00 puom: i 

W puom? = - 71194 

# 
4x23" 
“WLU LNGNI 40 FW ‘ 35903: 18q0) 6° 
*SIN319144300 INISOD 4O Fav * av1-S00 = 18q0 6° 
“AMINA YLYO 30 HLONST : Ww = (eqo (6: 
“INTOd AMLNG WHDSSNYUL 3NISOO LS¥4 : ee ee 


WTHTIM Wd yOHLNY 


“NOTLYIMO WI Wd 
NI WY OSI SIHL “M30M0 SSU3A3N LIG NI 3Y¥ SIN3II195300 NIVWOD AINSO3Ud 3HL 


“JOLLA SIH NI (BLS300NS LNdNI (SMBTNO SSI 3HL NYHL YGHLWY SIN3I9I 441300 
NIQWOO SWIL M300 WYLLUN MOTIY 01 C313I00W N38 SUH WHLTNODW S37 


*(66Z0-0000/98/S-#S6THD) ‘Z “TOA #-I/E°VBZ d ‘7861 HOM 127-61 ‘WO ‘oD3I0 
NYS “ONISS300Ud TWNOIS ONY “HOZ3dS ‘SITLSNOOY NO 3ON3U35NOO TYNOILUN 
“VGUNT 3331 FHL 40 SONICS300Nd SAL NI CGHSITGNd “WYOISNYEL SNISOO 1SW4 

¥ - 193 “STOLL SIH NI 337 19 SNO3Ad AP CGNTTINO WHLINOOW 3Hi NO O3Svd 
WYOSSNWEL SNISOD 1S¥4 ¥ 


13 XTON3dd¥ 


olan a ot a | 


*“dOOT SOISLNO” 1SH1 01 HOMVUE OSAYT3O 


"7 AG VT AIT F 


*SUGINIOd VIVO 3100dn § 
"LON 41 ‘HOMWU CGAWT3d ¢ 


& UFO SNOLLYTNIOW) Sw * 


"YGIST93Y WMLNOD d007 GNNI Zaidan * 


‘ZAM OML ATL WHS 


"NOILIGOY anys ‘ 


“MON NOT 
3 SIND WIS 3Hi ‘“SNOILIO0¥ 
30 USGHIN O00 MY SHY SUSHI DONIS ° 


"JOON 1V3d3¥ 135 | 


*ABINNOO 1V3dY df 13S ¢ 
"LON 41 “HONYU CGAVTAO * 
€ILTWWOD JOO SIMA SI * 


i) oe HST 

Tay ‘ey Im 

ad ‘be mm 

dOOT S0ISLNO- 1901 aid 
Guy ‘T 1a0 

#y ‘OU! 100v 

our't HST 


“OW 193d 1971 2 OG 


11H) ++ Tee “OW dts 


! 
+ 
# 


# 


OW 1971 


OU'L( THI) 467s Tate £300 


WOW 1S ald 


“dOOT SUISNIT LST OL C3ANTS0 HOMES 


1S"HOOTO 0 

3u'T 1ans 

Ww'SW Im 

dOOT 3UISNI™ SH) dana 
Un “be lait} 


OU A( OUI +475 A (OU) +4 Lupe € 300" 


ou ‘a(Oul 


)o+pune a (OL) +4500 £4000 


OU ‘a (041 )++9ude “Gg (OUl) ++EuHe £300" 


*SUGINIOd YLYO 31¥Gdn § 


OMG (OML) +4+Z7uee a (OUT) +4 Lege €300v 


“S3TU3S dOOT 1507 403 d007 S0ISNI 30 ONG 


“NOLLIGQY GNOD3S anes * 


“NOILIOOY 1SUI3 BANS ¢ 
“yi¥d ONL GNOO3S ogy : 
"W1¥0 OML 1SUI4 GOY * 


*d007 HOWS NI 3NO0 3YY SNOTLICOY OM. * 


“SUBINIOd ONILYGd SNNILNOD § 


“UNDO =: 
d001 BGNNI SY SNA TW WILINI 36n § 
“210d MBINIOd Hiv * ou (ONT 
"WAINNOD LY3d3H df 13S : 


LTT) 48a Ty dS 


LCTML) ++ Tae Ou dS 


Ta 2CTaL) ++9uee Cae = C30 
OULU) ++Zuye Taye = C30 


% 
SOOT SISNIT ISH 


SUISNI- ONS Bidd 


cu ‘7uy Im 

(ONL )++5 oH dN 

jut [ans 

vu‘ Tad Im 

)eopente G(OUl +47 UH £300V 
ou‘ ow Im 


+ 


“SUBINNOD OMY SUBINIOd 31v0dn * 


HS) 

Im 
+ 
:d00T SOS INO LS 
* 


WWI‘8 Im 
Zu’ a 1m 
saws Tae ‘y 100v 
NOLLIOGY WNIS 403 SHBINIOd Tay‘ 9uy 1m 
VLU OMY SHBISIO3Y XSONI 3ZIWILINI ¢ our ‘z Im 
4 
“WMOSSNYUL QNISOD ot 
1S¥3 430 ONS 3H. 1Y SNOILIGQY 3SUBGARY 11d FHL 40 TW S300 S31u8S GOON SIHL st 
& 
“S3IN3S dOOT NOILICOY 3SU3A3N L1G t 
% 
°d007 3OWLS ATSUBLING WNI4 30 0ND# 
q 
*ATSALINA 3d 40 2/1 + 
Wadd NI ATL WW NOTLIGOY ONZ SANS ¢ (TUL eutee ey 41S 
2 
001 ONZ~ ONG 
# 
*ATSMBLLNG 30 2/1 Ps 
Wadd NI ATMIL WW NOILIGOY LST SANS ¢ (TUT) tue # ‘ZY dS 
“AIL VW NOTLICOY * 
ONZ OL ATL WH NOILIVULENS GNZ cov ° eu ty‘ey £3000 
*ATSBLLNG ONZ 40 7/1 4BNOT + 
NI NOILOWULENS ONZ 3ANS 1202" (TUL) pues TY ts 
AQ LWG3Y NOTLIQQY 1S) ATdIL WwW * ZH Lug+e ‘7H €4Adl 
"ATALING 1ST 30 2/1 Y3KO7 
NI NOILOWULENS IST BAUS “1202 (TUL) Zay-# “OY dS 
Ad LWS3Y NOILIQOY ONZ AIIM F cu‘ cubes ey E4Ach 
“y1¥d r 
ATSMBLING 1ST Gay °S AG Da" (TUT) +4Z eS (TUT) +4 Tate £400 
LWG3Y NOLLIULENS ONZ ATdIL TH * Ty‘eu‘ ty €4AdM 
‘ylud of + 
ATSMBLING ONZ OGY SAM | CUS CIUT) +epuee (TUL) +eeube £4000 ty 
LG NOLLIVULENS IST ATi Mw F ou‘vy‘ou €4AdM 
“WIN ATSUBLING GNZ LoyuLans * TU eee ‘ue €48ns 
“YLVO ATSALLING 1ST LoyULENS * Ou‘ Tune ‘Zu €48NS 
+ 
*d001 # 
W3d GLYNN W) Jay SI1TAALING OM * d00T ONZ~ ON @idd 
(moTad ‘S ‘aaTHO  * + 
SI VW SIWL W/((Z) 10S) <-"3°D Ps 
"(e/1d)SO(W/Z) BLOW ¢ US ude * ue C4tAdM 
“MINTO IN3d3H SZIWILINI * Jy‘ cuy ‘Ou! €laqv 
uy “€ 10g 
Guy ‘I- HS1 
“SUGINIOd WLU df 13S * cu‘ T 100v 
“MAISIOSN XBONT AZT WILINI * lal ‘¢ im 


“SNOILIOQY 3SU3A3U LI 30 30W1S ISHI3 OW S3T1RGLINE LSY1 SSININI et 


"BZT- L-NSUM LNGNOdXS 31 “3u0iS : oud Lay 11S 


“C3MOLS 3d 01 LON ST SM JI 3U3H WON HONWEE 7340730 


dN 
Tad “ute Guy €1ans 
rae: | 
JOIAIG 10013 403 Hive Y39SINI 3s : Guy “#7 HS 
*SIHL 
Od {-NO0 ‘0437 SI 1N919134300 41 * 3401S" LNOd 0034 
0 = Sut 4 O14 OUBZ 135 ‘ ou aude m 


“OM3Z LON 41 ‘S* Ad O83Z 1N3I9144900 ATI 1H 


“S3IM3S dO LSH1 30 ANI 


Ca os ol 


Appendix E2. A Fast Cosine Transform (Inverse Transform) 
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Appendix E3. FCT Cosine Tables File 


x 
¥ APPENDIX E3 
# 
% FCT COSINE TABLES FILE 
x 
it TO BE LINKED WITH FCT SOURCE CODE FOR 32 POINT FCT. 
# 
COEFFICIENTS ARE 1/(2 * COS(N®PI/2M)), WHERE N IS A NUMBER FROM 1 to 
# M-1. M IS THE ORDER OF THE TRANSFORM. 
t 
+ FOR A 32 POINT FCT, N IS IN THE FOLLOWING ORDER: © 
8 bee 100-3) 13,55, bs J. 9. 
* 2, 14, 4, 10, 
4, 12, 
t 3 
% 
+ THE LAST VALUE IN THE TABLE I5 2/M. 
# 
% 
~global COS_TAB 
-global M 
# 
M ~5et 16 
% 
.data 
% 
COS_TAB 


float 0.3024193 
»float Je 1011487 
float 0.5224986 
»float 1.722447} 
float 0,5669440 
»float 1.0606777 
float 0. 6468218 
«float 0, 7881546 
float 0,.5097956 
» float 229629154 
float 0. 6013449 
float 0, 8999762 
float 0. 5411961 
»float 1, 3065630 
float (). 7071068 
float 0, 1250000 
Jerid 


AFPENDIX E4 
DATA FILE 
-dlobal 


data 


float 
«float 
float 
float 
«float 
«float 
float 
.float 
float 
» float 
float 
float 
. float 
»float 
«float 
~ float 
2end 


COEFF 


Appendix E4. Data File 


Appendix F. Test Vectors, 64-Point Sine Table, Link Command File 


Appendix F1. Example of a 64-Point Vector to Test the FFT Routines 
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Appendix F72. File to Be Linked with the Source Code for a 64-Point, 
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Appendix F3. Link Command File 
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# LINE COMMAND FILE 
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t RG NOT TYPE IN THESE FIRST SEVEN LINES 
-y lZopté4. out 

}2topt.obj 

51nh4, obj 
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etext : {} 

edata ? 0 | 

IN 809300h = { 12fopt. obj (IN) 
obss 809C00h: {) 
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